AITopics | cluster index

Collaborating Authors

cluster index

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Summarizing Bayesian Nonparametric Mixture Posterior -- Sliced Optimal Transport Metrics for Gaussian Mixtures

Nguyen, Khai, Mueller, Peter

arXiv.org Machine LearningJan-2-2025

Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering, with density estimation as a secondary goal (Wade and Ghahramani, 2018; Dahl et al., 2022). We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing density estimation of the mixing measure (or mixture) as an inference target. One of the key features is the model-agnostic nature of the approach, which remains valid under arbitrarily complex dependence structures in the underlying sampling model. Using a decision-theoretic framework, our method identifies a point estimate by minimizing posterior expected loss. A loss function is defined as a discrepancy between mixing measures. Estimating the mixing measure implies inference on the mixture density and the random partition. Exploiting the discrete nature of the mixing measure, we use a version of sliced Wasserstein distance. We introduce two specific variants for Gaussian mixtures. The first, mixed sliced Wasserstein, applies generalized geodesic projections on the product of the Euclidean space and the manifold of symmetric positive definite matrices. The second, sliced mixture Wasserstein, leverages the linearity of Gaussian mixture measures for efficient projection.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2411.14674

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph

Karamete, B. Kaan, Glaser, Eli

arXiv.org Artificial IntelligenceJul-22-2024

This paper discusses how to generate general graph node embeddings from knowledge graph representations. The embedded space is composed of a number of sub-features to mimic both local affinity and remote structural relevance. These sub-feature dimensions are defined by several indicators that we speculate to catch nodal similarities, such as hop-based topological patterns, the number of overlapping labels, the transitional probabilities (markov-chain probabilities), and the cluster indices computed by our recursive spectral bisection (RSB) algorithm. These measures are flattened over the one dimensional vector space into their respective sub-component ranges such that the entire set of vector similarity functions could be used for finding similar nodes. The error is defined by the sum of pairwise square differences across a randomly selected sample of graph nodes between the assumed embeddings and the ground truth estimates as our novel loss function. The ground truth is estimated to be a combination of pairwise Jaccard similarity and the number of overlapping labels. Finally, we demonstrate a multi-variate stochastic gradient descent (SGD) algorithm to compute the weighing factors among sub-vector spaces to minimize the average error using a random sampling logic.

graph, node, vector, (16 more...)

arXiv.org Artificial Intelligence

2407.15906

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > Maryland > Baltimore (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.46)
Transportation > Electric Vehicle (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

Hoang, Bao, Pang, Yijiang, Liang, Siqi, Zhan, Liang, Thompson, Paul, Zhou, Jiayu

arXiv.org Artificial IntelligenceJun-16-2024

Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The ComBat is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, ComBat lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of ComBat harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.

cluster combat, combat, harmonization, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3637528.3671590

2405.15081

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
North America > United States > Michigan > Ingham County > Lansing (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Cluster-based Sampling in Hindsight Experience Replay for Robotic Tasks (Student Abstract)

Kim, Taeyoung, Har, Dongsoo

arXiv.org Artificial IntelligenceJan-10-2024

In multi-goal reinforcement learning with a sparse binary reward, training agents is particularly challenging, due to a lack of successful experiences. To solve this problem, hindsight experience replay (HER) generates successful experiences even from unsuccessful ones. However, generating successful experiences from uniformly sampled ones is not an efficient process. In this paper, the impact of exploiting the property of achieved goals in generating successful experiences is investigated and a novel cluster-based sampling strategy is proposed. The proposed sampling strategy groups episodes with different achieved goals by using a cluster model and samples experiences in the manner of HER to create the training batch. The proposed method is validated by experiments with three robotic control tasks of the OpenAI Gym. The results of experiments demonstrate that the proposed method is substantially sample efficient and achieves better performance than baseline approaches.

cluster model, replay buffer, successful experience, (10 more...)

arXiv.org Artificial Intelligence

2208.14741

Country: Asia > South Korea > Daejeon > Daejeon (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Clustering Indices based Automatic Classification Model Selection

Santhiappan, Sudarsun, Shravan, Nitin, Ravindran, Balaraman

arXiv.org Artificial IntelligenceMay-23-2023

Classification model selection is a process of identifying a suitable model class for a given classification task on a dataset. Traditionally, model selection is based on cross-validation, meta-learning, and user preferences, which are often time-consuming and resource-intensive. The performance of any machine learning classification task depends on the choice of the model class, the learning algorithm, and the dataset's characteristics. Our work proposes a novel method for automatic classification model selection from a set of candidate model classes by determining the empirical model-fitness for a dataset based only on its clustering indices. Clustering Indices measure the ability of a clustering algorithm to induce good quality neighborhoods with similar data characteristics. We propose a regression task for a given model class, where the clustering indices of a given dataset form the features and the dependent variable represents the expected classification performance. We compute the dataset clustering indices and directly predict the expected classification performance using the learned regressor for each candidate model class to recommend a suitable model class for dataset classification. We evaluate our model selection method through cross-validation with 60 publicly available binary class datasets and show that our top3 model recommendation is accurate for over 45 of 60 datasets. We also propose an end-to-end Automated ML system for data classification based on our model selection method. We evaluate our end-to-end system against popular commercial and noncommercial Automated ML systems using a different collection of 25 public domain binary class datasets. We show that the proposed system outperforms other methods with an excellent average rank of 1.68.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.13926

Country:

Europe (0.46)
Asia (0.28)

Genre: Research Report > Experimental Study (0.93)

Add feedback

Why Are Conditional Generative Models Better Than Unconditional Ones?

Bao, Fan, Li, Chongxuan, Sun, Jiacheng, Zhu, Jun

arXiv.org Artificial IntelligenceDec-1-2022

Extensive empirical evidence demonstrates that conditional generative models are easier to train and perform better than unconditional ones by exploiting the labels of data. So do score-based diffusion models. In this paper, we analyze the phenomenon formally and identify that the key of conditional learning is to partition the data properly. Inspired by the analyses, we propose self-conditioned diffusion models (SCDM), which is trained conditioned on indices clustered by the k-means algorithm on the features extracted by a model pre-trained in a self-supervised manner. SCDM significantly improves the unconditional model across various datasets and achieves a record-breaking FID of 3.94 on ImageNet 64x64 without labels. Besides, SCDM achieves a slightly better FID than the corresponding conditional model on CIFAR10.

artificial intelligence, learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2212.00362

Country:

Asia > China > Beijing > Beijing (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

PathFinder: Discovering Decision Pathways in Deep Neural Networks

İrsoy, Ozan, Alpaydın, Ethem

arXiv.org Artificial IntelligenceOct-1-2022

Explainability is becoming an increasingly important topic for deep neural networks. Though the operation in convolutional layers is easier to understand, processing becomes opaque in fully-connected layers. The basic idea in our work is that each instance, as it flows through the layers, causes a different activation pattern in the hidden layers and in our Paths methodology, we cluster these activation vectors for each hidden layer and then see how the clusters in successive layers connect to one another as activation flows from the input layer to the output. We find that instances of the same class follow a small number of cluster sequences over the layers, which we name ``decision paths." Such paths explain how classification decisions are typically made, and also help us determine outliers that follow unusual paths. We also propose using the Sankey diagram to visualize such pathways. We validate our method with experiments on two feed-forward networks trained on MNIST and CELEB data sets, and one recurrent network trained on PenDigits.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2210.00319

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Optimal Clustering with Bandit Feedback

Yang, Junwen, Zhong, Zixin, Tan, Vincent Y. F.

arXiv.org Machine LearningFeb-9-2022

This paper considers the problem of online clustering with bandit feedback. A set of arms (or items) can be partitioned into various groups that are unknown. Within each group, the observations associated to each of the arms follow the same distribution with the same mean vector. At each time step, the agent queries or pulls an arm and obtains an independent observation from the distribution it is associated to. Subsequent pulls depend on previous ones as well as the previously obtained samples. The agent's task is to uncover the underlying partition of the arms with the least number of arm pulls and with a probability of error not exceeding a prescribed constant $\delta$. The problem proposed finds numerous applications from clustering of variants of viruses to online market segmentation. We present an instance-dependent information-theoretic lower bound on the expected sample complexity for this task, and design a computationally efficient and asymptotically optimal algorithm, namely Bandit Online Clustering (BOC). The algorithm includes a novel stopping rule for adaptive sequential testing that circumvents the need to exactly solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower bound asymptotically, and significantly outperforms a non-adaptive baseline algorithm.

optimal clustering, partition, zhong, (15 more...)

arXiv.org Machine Learning

2202.04294

Country:

Asia > Singapore > Central Region > Singapore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

#artificialintelligenceApr-10-2018, 15:56:49 GMT

Associations are shown with cluster indices, which summarize properties of clusters derived from affinity propagation clusters of the TIL map--properties that provide details on local structure beyond simple densities. The Ball-Hall index is a particular clustering index, summarizing the mean, through all the clusters, of their mean dispersion and is equivalent to the mean of the squared distances of the points of the cluster with respect to its center. In our data, the Ball-Hall index is correlated (ρSpearman 0.95) with the mean cluster extent, CE. Significance test p value is shown in the lower left. The Banfield-Raftery index is the weighted sum of the logarithms of the mean cluster dispersion and, in our data, often correlates with the number of clusters.

pathology image, spatial organization and molecular correlation, tumor-infiltrating lymphocyte, (5 more...)

#artificialintelligence

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Filters

Collaborating Authors

cluster index

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

7b39f4512a2e3899edcc59c7501f3cd4-Supplemental-Conference.pdf

Summarizing Bayesian Nonparametric Mixture Posterior -- Sliced Optimal Transport Metrics for Gaussian Mixtures

An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph

Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

Cluster-based Sampling in Hindsight Experience Replay for Robotic Tasks (Student Abstract)

Clustering Indices based Automatic Classification Model Selection

Why Are Conditional Generative Models Better Than Unconditional Ones?

PathFinder: Discovering Decision Pathways in Deep Neural Networks

Optimal Clustering with Bandit Feedback

Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images